fix: use iconv_strlen in kanji mode#200
Conversation
|
Just a couple of thoughts:
|
1e7eea5 to
8bf099a
Compare
|
Please rebase against main to get the tests working. |
Fixes unreadable QR codes in Kanji mode with Shift-JIS encoding (see Bacon#172). PR Bacon#173 worked around the issue by forcing Byte mode, but that lost Kanji mode’s efficiency. The actual cause was using strlen() to count characters. Replacing it with iconv_strlen($content, 'utf-8') ensures correct character count and restores proper Kanji encoding.
|
Rebased. Please let me know what you would like to see improved (including commit messages) before considering this for merging. |
|
Looks like there's something wrong with your match syntax? |
39a90d6 to
00938aa
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #200 +/- ##
============================================
+ Coverage 70.58% 70.81% +0.22%
+ Complexity 995 994 -1
============================================
Files 49 49
Lines 3182 3169 -13
============================================
- Hits 2246 2244 -2
+ Misses 936 925 -11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
LGTM, thanks! :) |
It's a shame, multiple expressions cannot be on separate lines… Furthermore, Mode::KANJI(), default => iconv_strlen($content, 'utf-8'), |
|
Actually, I just looked through the other modes – are we certain that they all require |
|
I haven’t really dug into the other modes, so I can’t speak with confidence there. My changes were scoped specifically to Kanji mode. Alternatively, it could look like this: $numLetters = match ($mode) {
Mode::BYTE() => $dataBits->getSizeInBytes(),
Mode::KANJI() => iconv_strlen($content, 'utf-8'),
default => strlen($content),
};That said, I can’t say for sure whether this covers all cases correctly. |
|
Well, as Match on all expected modes coming from |
|
I had thought about this approach too (explicitly matching I had discarded it because any hypothetical future mode addition could suddenly cause breakage, hence the idea of a default/fallback But overall, that could still be an acceptable approach. Note that |
|
oh, UnhandledMatchError is totally fine, I didn't know about that. But if somebody modifies |
For NUMERIC and ALPHANUMERIC modes, strlen() is sufficient since these modes only operate on single-byte characters. strlen() runs in O(1) time and avoids the overhead of iconv_strlen().
|
Oh, right — And since I’ve updated the PR accordingly, and also took the opportunity to rewrite |
Nicer, and more consistent with some other code in the encode() method. Invalid modes are really not expected here, as this private method is called only by encode(), and the mode is determined right at the beginning of encode() by calling chooseMode(). Though, if an invalid mode ever comes here, the match would throw an UnhandledMatchError.
This fixes the unreadable QR code issue in Kanji mode when using Shift-JIS encoding (see #172).
PR #173 attempted to resolve this by forcing Byte mode, which worked but sacrificed the efficiency of Kanji mode.
In a comment on #173, I discovered the root cause:
strlen()was incorrectly used to count characters, leading to misalignment in encoding.Replacing it with
iconv_strlen($content, 'utf-8')ensures accurate character counting for multibyte strings, preserving proper Kanji mode behavior and producing valid QR codes.